-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added streams to CSV reader and writer api #14340
Conversation
/ok to test |
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean API change.
I think we can hit a few more code paths in the tests :)
cpp/tests/streams/io/csv_test.cpp
Outdated
TEST_F(CSVTest, CSVReader) | ||
{ | ||
constexpr auto num_rows = 10; | ||
auto int8_values = random_values<int8_t>(num_rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this the same situation as with ORC, where we don't really need the random values.
cpp/tests/streams/io/csv_test.cpp
Outdated
.dtypes({cudf::data_type{cudf::type_id::INT8}, | ||
cudf::data_type{cudf::type_id::INT16}, | ||
cudf::data_type{cudf::type_id::INT32}, | ||
cudf::data_type{cudf::type_id::INT64}, | ||
cudf::data_type{cudf::type_id::UINT8}, | ||
cudf::data_type{cudf::type_id::UINT16}, | ||
cudf::data_type{cudf::type_id::UINT32}, | ||
cudf::data_type{cudf::type_id::UINT64}, | ||
cudf::data_type{cudf::type_id::FLOAT32}, | ||
cudf::data_type{cudf::type_id::FLOAT64}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't pass the data types the reader will launch an extra kernel to infer them, so we should not include this option :)
cpp/tests/streams/io/csv_test.cpp
Outdated
auto int16_values = random_values<int16_t>(num_rows); | ||
auto int32_values = random_values<int32_t>(num_rows); | ||
auto int64_values = random_values<int64_t>(num_rows); | ||
auto uint8_values = random_values<uint8_t>(num_rows); | ||
auto uint16_values = random_values<uint16_t>(num_rows); | ||
auto uint32_values = random_values<uint32_t>(num_rows); | ||
auto uint64_values = random_values<uint64_t>(num_rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this can all be a single column, the code path is the same
cpp/tests/streams/io/csv_test.cpp
Outdated
auto uint32_values = random_values<uint32_t>(num_rows); | ||
auto uint64_values = random_values<uint64_t>(num_rows); | ||
auto float32_values = random_values<float>(num_rows); | ||
auto float64_values = random_values<double>(num_rows); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a string column, there are additional thrust calls that post-process quotes (IIRC).
/ok to test |
Thank you for the feedback, @vuule. Changing the tests did reveal more code paths creating string scalars using the default stream, which are now fixed. |
/ok to test |
@shrshi you may want to resolve the merge conflicts in test CMake first to unblock CI. |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tracking down the default stream use!
Couple more tiny changes
/ok to test |
/ok to test |
wrapped = cudf::make_strings_column( | ||
d_chars, d_offsets, d_bitmask, null_count, cudf::test::get_default_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include changes from other PR? The additional stream parameter here and below seems unrelated to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that CSV stream test is exercising this code path while no prior stream test was, so before this PR we didn't observe that there was a missing stream argument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this change is needed for the string column in the tests - make_strings_column
is called with the default stream otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks!
/merge |
Description
This PR contributes to #13744.
-Added stream parameters to public APIs
cudf::io::read_csv
cudf::io::write_csv
-Added stream gtests
Checklist